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Abstract 

A new scheme to represent phonological 
changes during continuous speech recog- 
nition is suggested. A phonological tag 
coupled with its morphological tag is de- 
signed to represent the conditions of Ko- 
rean phonological changes. A pair wise lan- 
guage model of these morphological and 
phonological tags is implemented in Ko- 



rean speech recognition system. Perfor- 



mance ot the model is verified through 
the TDNN-based speech recognition exper- 
iments. 

1 Introduction 

The most widely used language models in speech 
recognition are word-level models, such as word- 



pairs and word-bigrams (Lee, 1989) (Bates et al. 
1993| )( Agnas et al., 1994). However, these mod- 



els take too much space and need large corpus to 
be correctly trained. Also they are domain depen- 
dent, so it is hard to add new vocabularies. To 
cope with these problems, several category-level lan- 
guage models are suggested ( Jardino, 1994| )( Yang! 
et al., 1994). These models include word-category 
models based on the fisrt and last syllables of 
the words, and models using an automatic catego- 
rization technique to reduce the perplexity. The 
category-level language models showed a reduction 
in space requirements and better domain indepen- 
dance. For the agglunative languages, several mor- 
phcmc categ o ry /tag- level m odels are also suggested 
flSakai, 199i|flNakata, 1994|) . These models are ba- 
sically the same as the ones used in text tagging 
systems, and use bigram/trigram statistics between 
tags. 

However, Korean has many phonological changes 
which happen in a morpheme and between mor- 
phemes, and those changes result in the disparity be- 



tween phonetic and orthographic descriptions of the 
morphemes. To cope with the phonological changes 
during Korean speech recognition, we suggest a rep- 
resentation scheme for the phonological changes, and 
a morphological and phonological tag pair language 
model (we call it pairwise language model). A hier- 
archical morphological tag set derived from the one 



used in written text analysis (Lee and Lee, 1992) is 
used and a phonological tag set is constructed from 
the Korean s tandard pronounciation rules flof Edu- 



cation, 1991 ). Performance of the model is tested 



through an experimental TDNN (time-delayed neu- 
ral network) speech recognition system. The pro- 
posed model is quite extensible to new vocabularies 
and new domains by adding new dictionary entries 
for the necessary morphemes, and can be refined to 
bigram or trigram probabilistic models to give better 
recognition results. 

2 Declaritive modeling of Korean 
phonological rules 

Phonological changes during speech recognition in 
Korean are modeled with phoneme-sequence-to- 
morpheme dictionary entries and a binary connec- 
tivity matrix. 

Phoneme-sequence-to-morpheme dictionary 

Figure [l] shows a sample entry of the phoneme- 
sequence-to-morpheme dictionary. For a 
phoneme sequence [n u n], two morphemes are 
in the dictionary: one is an adnominal verb- 
ending and the other is a noun-ending. The 
figure shows a left and right morphological tag, 
and a left and right phonological tag for the ad- 
nominal verb-ending case. Q 

The morphological tag "eCNMG" says that the 

1 A phoneme sequence can be a sequence of mor- 
phemes due to contraction, especially in spontaneous di- 
alogues, and can have different left and right morpholog- 
ical categories. 



Left 
Morph. tag 



morpheme 
root form 



eCNMG 



Right 
Morph. tag 



eCNMG 



P-n 



P-n 



Figure 1: Sample dictionary entry for [nun] 



Pt=n 



P-n 



Ps=n 



P-m 



Pss=n 



P*=n 



Figure 2: Sample phonological connectivity matrix 
for consonant assimilation 



morpheme is a verb-ending(e), makes a complex 
sentence(C), especially a inner sentence(N), 
through a noun phrase construction(M) and it 
is an adnominal verb-ending(G). The phonolog- 
ical tag "P-n" says that the morpheme is not 
changed at all(-) and the first(and the last for 
the right phonological tag) phoneme is [n] . " P" 
in "P-n" says that it is a phonological tag. A 
phonological tag of the form "Pa=b"(see fig- 
ure 2 and 4) means that 'a' is pronounced as 
[b] and "Pa2b"(see figure 3 and 4) means 'a' 
is pronounced as [b] by the neutralization phe- 
nomenon. 

Binary connectivity matrix While the dictio- 
nary keeps the information about how a sin- 
gle morpheme is changed phonologically, the 
phonological binary connectivity matrix keeps 
the collocational information of two mor- 
phemes' pronounciations. 

Figure || shows the connectivity matrix entries 
for consonant assimilation phenomenon in Ko- 
rean. These entries say that a morpheme whose 
last consonant 't s ss'f] are changed into [n], can 
be followed by a morpheme whose first phoneme 
is [n] or [m]. Wild characters(*, ?) can be used 
to reduce the number of entries in the matrix. 

Generally, we apply the following two guidelines 
for the phonological rule modeling. 

• Make a new dictionary entry for each mor- 
phologically conditioned phonological changes: 
Some phonological changes, such as vowel con- 
traction and neutralization, happen only in the 



2 We will use 't s ss' notation to mean t', 's', or 'ss' 
through out in this paper. 



specific morphemes in a specific collocational re- 
lation. In these cases, registering all the phono- 
logically changed morphemes or morpheme- 
sequences is prefered. 

• Represent the final changes when more than one 
changes occur: When more than one phono- 
logical changes occur for a morpheme or be- 
tween morphemes, register only the final form 
of each morpheme rather than all the interme- 
diate forms. This strategy increases the number 
of dictionary entries but eliminates the succes- 
sive rule application. 

3 Representative phonological 
modeling examples 

In this section, major Korean pronounciation rules 
(text-to-speech rules) flof Education, 1991 ) are ex- 
plained and their modeling (for speech-to-text con- 
version) using the dictionary and the connectivity 
matrix is described. |^| 

Yale romanization is adoted to represent the Ko- 
rean phonemes. 

Neutralization 4 In Korean, only 7 consonants are 
pronounced as syllable coda. This is called neu- 
tralization or consonant cluster simplification 
and happens when the morpheme is followed 
by a pause or a consonant. 

The followings are some examples : 
"takk+ta" [tak-tta] (clean) 

'kk' => [k] 

" pwu-ekh" [pwu-ek] (kitchen) 

'kh' => [k] 

"talk+kwa" [tak-kkwa] (chicken and) 

'lk' ^ [k] 

"os+kwa" [ot-kkwa] (clothes and) 

's' => [t] 

"nelp+ko" [nel-kko] (wide and) 

] V => [1] 

"celm+ko" [cem-kko] (young and) 

'lm' =>• [m] 

Figure [| shows the dictionary entry for the neu- 
tralized " talk" (chicken) and the corresponding 
connectivity matrix entry. " PEND" is a special 
tag for the pause. For each "Pa2b" tag, a con- 
nectivity with "PEND" is added in the matrix. 

Glottalization First consonants 'k t p s c' after last 
consonants 'k(kk, kh, ks, lk) t(s, ss, c, ch, th) 
p(ph, lp, lph, ps)' are pronounced as [kk tt pp ss 
cc] respectively. Verb-ending's first consonants 

3 The pronounciation rules cover intra- word and inter- 
word phonological changes. 



tak 
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morpheme 
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talk 
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PEND 



Figure 3: Dictionary entry for [t a k] and the corre- 
sponding connectivity matrix entry 
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Figure 5: Dictionary entries for [a m] and [m an], 
and the corresponding connectivity matrix entry 



Plk2k 



Pk=kk 



Figure 4: Dictionary entry for [kk wa], and the cor- 
responding connectivity matrix entry 



'k t s c' after a verb-stem with 'n(nc) m(lm) lp 
1th' as its last consonants, are pronounced as 
[kk tt ss cc]. 



Here are some examples: 
"ppet+ta" [ppet-tta] 



"iss+ten" 



"talk+kwa" 



"ulp+ta" 



[it-tten] 

[tak-kkwa] 

[up-tta] 



(stretch) 

V [tt] 
(have existed) 

V => [tt] 
(chicken and) 
'k' => [kk] 
(recite) 

V => [tt] 



Figure [| shows an example of "talk- kwa" which 
is pronounced as [tak-kkwa]. The connectivity 
matrix says that a morpheme with first con- 
sonant changed from 'k' into [kk] can follow a 
morpheme with last consonant 'lk' neutralized 
as [k]. 

Assimilation Last consonants 'k(kk, kh, ks, lk) t(s, 
ss, c, ch, th, h) p(ph, lp, lph)' followed by 'n m' 
are pronounced as [ng n m]. First consonant 
'1' following the last consonants 'm ng' is pro- 
nounced as [n] . First consonant '1' following the 
last consonants 'k p' is also pronounced as [n]. 
'n' following or followed by '1' is pronounced as 
[1]. The first consonant 'n' following 'lh 1th' is 
also pronounced as [1]. 



"mek+nun" [meng-nun] (eating) 

'k' => [ng] 

"iss+nun" [in- nun] (existing) 

'ss' => [n] 

"tam-lyek" [tam-nyek] (courage) 

T => H 

"aph+man" [am-man] (only front) 

'ph' =>■ [m] 

Figure ^ shows an example of two consecutive 
phonological changes occurred: "aph" (front) is 
first neutralized to [a p] and then assimilated 
to [a m]. Following the general guidelines in 
section 2, we model this phenomena as Pph=m 
tag. This entry in the connectivity matrix obvi- 
ates the sequential application of several phono- 
logical rules. 

Consonant contraction 'h(nh, lh)' followed by 'k 
t c', are combined with those 'k t c' and pro- 
nounced as [kh th ch]. Final consonants 'k(lk) 
t p(lp) c(nc)' followed by 'h' are merged with 
that 'h' and pronounced as [kh th ph ch]. The 
followings are some examples: 

"noh+ko" [no-kho] (put down) 

'h'+'k' =4> [kh] 
" manh+ko" [man-kho] (many) 

'h'+'k' =4> [kh] 
"talh+ci" [tal-chi] (wear out) 

'h'+'c' => [ch] 
"palk+hi+ta" [pal-khi-ta] (lighten) 

'k'+'h' [kh] 

Figure |^ shows the case of "noh-ko". The 'h' 
and 'k' are merged to [kh]. The phonologi- 
cal tag " Ph=X" means that " h" is disappeared 
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Figure 7: The TDNN-based speech recognizer 



Figure 6: Dictionary entries for [n o] and [kh o], and 
the corresponding connectivity matrix entry 

(changed to X(nothing)). 

Others "eye ccye chye" in a word's conjugational 
form, are pronounced as [ce cce che] . For exam- 
ple, 

"ka-ci+e" => "ka-cye" 
[ka-ce] 
(have) 

'eye' => [ce] 

"cci+c" => "ccye" 
[cce] 
(cook) 

'ccye' =^> [cce] 

Since the desyllabification is morphologically 
conditioned, the dictionary entries model the 
phenomenon according to our general guide- 
lines. So, [k a c e] have the following many 
morphological forms in the dictionary: 
[kace] "ka-ci" 

(morpheme root form) 

"ka-ci+e" 

(root+sentcntial ending) 
"ka-ci+e" 

(root+connective verb-ending) 
"ka-ci+e" 

(root+aux. connective verb-ending) 

However, 'yey' not in syllables "yey lyey" can 
be pronounced as [ey]. In these cases, two dis- 
tinct entries are made in the dictionary for each 
morpheme. 

In this way, we modeled all the Korean pronoun- 
ciation rules in about 1000 entries of phoneme- 
sequence-to-morpheme dictionary and more than 
500 lines of binary phonological connectivity matrix. 



4 Pair-wise language model 

The phonological connectivity matrix developed in 
the previous section, coupled with the morpholog- 
ical connectivity matrix is used as a pair-wise lan- 
guage model for continuous Korean speech recog- 
nizer. The morphological connectivity matrix is con- 
structed similarly to model the Korean morphotac- 
tics using the morphological tags in the dictionary 
( [Lee and Lee, 1992| ). 

Figure |?j shows the architecture of the TDNN- 
based continuous speech recognizer. The TDNN- 
based phoneme recognizer gvies a sequence of 
phoneme vectors for the input speech, and this 
phoneme sequence is decoded by the Viterbi lexi- 
cal decoder. Tree-structured phoneme-sequence-to- 
morpheme dictionary is used in the lexical decoding 
phase and a morpheme graph is extracted after the 
pair- wise language model is applied. The language 
model checks each adjacent pair of morphemes in the 
graph whether they are connectable morphologically 
and phonologically. 

The suggested model using the connectivity matri- 
ces for the phonological tags and the morphological 
tags is easy to construct, easy to maintain, and do- 
main independent. A new morpheme can be added 
by coding one or more dictionary entries correspond- 
ing to its phonological variations. 

5 Experiments 

Performance of the pairwise language model is tested 
using the TDNN-based phoneme recognizer. Input 
speech is sampled at 16Khz and the melscaled filter- 
bank output is used as the recognizer's input. The 
TDNN phoneme recognizer is trained for all 39 Ko- 
rean phonemes from the carefully selected 75 sen- 
tences (phone-balanced corpus). Using this recog- 
nizer, we do the Viterbi lexical decoding by em- 
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Figure 8: Morpheme recognition results for new 321 
sentences 

ploying the tree-structured phoneme-sequence-to- 
morpheme dictionary, and apply the proposed pair- 
wise language model. For new 321 sentences, apply- 
ing the language model produces 92.6% correct mor- 
phemes under the 70% correct phoneme recognition 
performance (figure ^|). The evaluation is based on 
the DP best matching of the morpheme graphs with 
the correct morpheme sequences. 

6 Conclusion 

In this paper, a new scheme to represent phonologi- 
cal changes in Korean is suggested. A pair-wise lan- 
guage model of morphological and phonological tags 
is proposed for continuous Korean speech recogni- 
tion. The proposed model has the following advan- 
tages in phonological modeling for Korean speech 
recognition: 

• domain independent, 

• easy to construct, 

• easy to maintain, 

• easy to add a new vocabulary. 

The pairwise language model integrates speech 
recognition and natural language processing at 
the morpheme-level, and the morpheme-level in- 
tegration provides the full-fledged morphologi- 
cal/phonological processing which is essencial for ag- 
glunative and morphologically complex languages, 
such as Korean and Japanese. The model can be ex- 
tended to categorial bigram models which are widely 
used in Korean text tagging systems. 
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